Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test route url fix #66

Closed
wants to merge 3 commits into from

Conversation

sleshchenko
Copy link
Member

@sleshchenko sleshchenko commented Aug 8, 2019

The following PR has the following separated changes:

  1. Update logrus from old 0.11.5 (Mar 14, 2017) to the latest v1.4.2 (May 18, 2019). It makes easier to debug che-operator if needed by enabling method name (more see https://github.com/sirupsen/logrus#logging-method-name)
  2. Fix fetching test route URL.
    There was a typo in test route assignment since := declare new variable with limited scope, and = assign new value to exist variable.
    Also, fetching is improved to make waiting for route host non-blocking.

PB logrus v1.4.2 https://dev.eclipse.org/ipzilla/show_bug.cgi?id=20764

@sleshchenko sleshchenko added the bug Something isn't working label Aug 8, 2019
@sleshchenko sleshchenko self-assigned this Aug 8, 2019
@sleshchenko sleshchenko force-pushed the testRouteURLFix branch 2 times, most recently from 1848288 to 736d9cf Compare August 8, 2019 08:59
@nickboldt
Copy link
Contributor

This issue needs a milestone... is it blocker for 7.0? P1 for 7.1?

Also I'm not sure I can review this. Might be better to let @davidfestal review when he's back from PTO next week.

@nickboldt
Copy link
Contributor

nickboldt commented Aug 8, 2019

as to CQs, the rule is basically...

  • if you depend on something that you SHIP (RUN-TIME), or something you need at BUILD-TIME only, you probably need a CQ.

You can search for existing CQs [0] for current/previous versions of your deps here:

[0] https://dev.eclipse.org/ipzilla/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=logrus&long_desc_type=substring&long_desc=&bug_file_loc_type=allwordssubstr&bug_file_loc=&keywords_type=allwords&keywords=&emailassigned_to1=1&emailtype1=substring&email1=&emailassigned_to2=1&emailreporter2=1&emailcc2=1&emailtype2=substring&email2=&bugidtype=include&bug_id=&chfieldfrom=&chfieldto=Now&chfieldvalue=&cmdtype=doit&order=Reuse+same+sort+as+last+time&field0-0-0=noop&type0-0-0=noop&value0-0-0=

Looks like logrus 1.4.2 is already in there [1], so you just need to submit a Piggyback CQ for Che [2], like the Codewind team did:

[1] https://dev.eclipse.org/ipzilla/show_bug.cgi?id=20621

[2] https://www.eclipse.org/projects/handbook/#pmi-commands-cq

If you are adding new versions of deps, or new deps, repeat the above process for each new dep.

@sleshchenko sleshchenko force-pushed the testRouteURLFix branch 2 times, most recently from dc390ef to 410e9d9 Compare August 29, 2019 12:51
time.Sleep(time.Duration(1) * time.Second)
testRoute := r.GetEffectiveRoute(instance, "test")
requestURL = "https://" + testRoute.Spec.Host
return nil, errors.New("Unable to get test route host for fetching certificate")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to be sure that the operator won't restart processing the custom resource immediately, you could also return the following:

return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm not able to return reconcile.Result without modifying the signature. It seems to be a helper class and return here seems not good, at least there is no such any method in this class.

Do you think it is OK to send reconcile.Result back though k8s_helper.go#GetEndpointTlsCrt -> create.go#CreateTLSSecret -> che_controller.go#Reconcile ?

Or the following fragment in che_controller would be good enough?

if err := r.CreateTLSSecret(instance, "", "self-signed-certificate"); err != nil {
  return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err
}

But it would mean that we wait one second on any error, like failed to get namespace, failed to create route, failed to fetch host...
Then why we do not wait 1 second in other places like: when we failed to create service account https://github.com/eclipse/che-operator/blob/9682f3448fe240c216aa22d9d3f56cc056b01494/pkg/controller/che/che_controller.go#L315-L317

BTW Seems that even with return reconcile.Result{}, err there is ~ 1 second between k8s_helper.go#GetEndpointTlsCrt invocations
Screenshot_20190905_104935
the first one - is the first invocation
2nd and 3rd are the second invocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm not able to return reconcile.Result without modifying the signature.

Oh, yes, I didn't pay attention to the fact it was inside a utility function.
You might simply add a retryLater boolean return value, and use it in the che_controller reconcile function.

Then why we do not wait 1 second in other places like: when we failed to create service account

Well the existing logic in the controller (that was already there when I started on it) seems to be the following:

  • return only an error when it is an unexpected error,
  • return a Result (with retry) + error when it's an error that we expect in some scenarios and on which we explicitly want to restart the reconciliation after a while.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that we should retry if quite unexpected exception occurred like failed to create route?
It's unexpected by maybe k8s/openshift API was temporary unavailable...
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if the CreateTLSSecret function returns continueLater with true, then the calling reconcile method should return reconcile.Result{Requeue: true, RequeueAfter: time.Second * 1}, err else it should return nil, err if an unexpected error occured.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I understand it, let me rephrase my question:
CreateTLSSecret may returns error in different situations:

  1. Failed to get an existing self-signed secret.
  2. Failed to get existing route (like when API is not available)
  3. Failed to create new route (like when API is not available)
  4. The fetched route does not have host (it's clear that we should retry in such case)
  5. Failed to request route host to retrieve TLS certificate
  6. Failed to create a secret with fetched TLS self-signed certificate.

Which of this errors you consider as unexpected and which expected (like 4 about route) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me you would only use continueLater=true in situation 4. We know that in some cases the host can take some time to be added to the route object. It's expected.

Possibly 5 could also be expected (and return continueLater=true, since sometimes the route may take some time to be setup.

But main point is the 4 IMO

Does it seem consistent to you ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this PR brings some fix and does not do code base worse.
Your proposed solution totally makes sense but it's more about refactoring the current architecture but not about fixing the issue I initially tried to use.
Sorry, but I'm not able to invest now in operator architecture.

@davidfestal If you agree that this PR is OK to merge - please approve.
If you think that it brings more issues than solves - feel free to close it.

@che-bot
Copy link
Contributor

che-bot commented Feb 4, 2020

Can one of the admins verify this patch?

Signed-off-by: Sergii Leshchenko <sleshche@redhat.com>
Signed-off-by: Sergii Leshchenko <sleshche@redhat.com>
@tolusha
Copy link
Contributor

tolusha commented Apr 1, 2020

Will done as part of refactoring
eclipse-che/che#15127

@tolusha tolusha closed this Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants